Errors and Linkage Disequilibrium Interact Multiplicatively When Computing Sample Sizes for Genetic Case-Control Association Studies

نویسندگان

  • Derek Gordon
  • Mark A. Levenstien
  • Stephen J. Finch
  • Jürg Ott
چکیده

Single nucleotide polymorphisms (SNP) may be used in case-control designs to test for association between a SNP marker and a disease. Such designs may assume that the genotype data are reported without error. Our goal is quantifying the effects that errors have on sample size for case-control studies with haplotypes formed by a disease locus and a SNP marker locus in the presence of linkage disequilibrium (LD). We consider the effects of a recently published error model on 2x3 chi-square analysis. We study the joint relation of LD and errors with sample size for three specific genetic disease models and two settings each of marker allele frequencies (total of 6 studies). Minimal sample size necessary for fixed asymptotic power is estimated as a 4th degree polynomial in the variables S (error) and D' (LD measure) via a backward step-wise regression. We find that increased error rates lower power. In all studies, we observe that LD and errors interact in a non-linear fashion. In particular, regression analyses shows that several higher order interaction terms have coefficients significantly different from 0 in each study, with fraction of variance explained greater than 0.9999. Finally, the increase in sample size necessary to maintain constant asymptotic power and level of significance as a function of S is smallest when D' = 1 (perfect LD). The increase grows monotonically as D' decreases to 0.5 for all studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sample Size and Statistical Power Calculation in Genetic Association Studies

A sample size with sufficient statistical power is critical to the success of genetic association studies to detect causal genes of human complex diseases. Genome-wide association studies require much larger sample sizes to achieve an adequate statistical power. We estimated the statistical power with increasing numbers of markers analyzed and compared the sample sizes that were required in cas...

متن کامل

Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms.

The purpose of this work is to quantify the effects that errors in genotyping have on power and the sample size necessary to maintain constant asymptotic Type I and Type II error rates (SSN) for case-control genetic association studies between a disease phenotype and a di-allelic marker locus, for example a single nucleotide polymorphism (SNP) locus. We consider the effects of three published m...

متن کامل

The Pattern of Linkage Disequilibrium in Livestock Genome

Linkage disequilibrium (LD) is bases of genomic selection, genomic marker imputation, marker assisted selection (MAS), quantitative trait loci (QTL) mapping, parentage testing and whole genome association studies. The Particular alleles at closed loci have a tendency to be co-inherited. In linked loci this pattern leads to association between alleles in population which is known as LD. Two metr...

متن کامل

Power and SNP tagging in whole mitochondrial genome association studies.

The application of genetic association studies to detect mitochondrial variants responsible for phenotypic variation has recently been demonstrated. However, the only power estimates currently available are based on the use of mitochondrial haplogroups, which can only tag a small fraction of the common variation in the mitochondrial genome. Here, power estimates are derived for a SNP-based stud...

متن کامل

Comparing the Efficacy of SNP Filtering Methods for Identifying a Single Causal SNP in a Known Association Region

Genome-wide association studies have successfully identified associations between common diseases and a large number of single nucleotide polymorphisms (SNPs) across the genome. We investigate the effectiveness of several statistics, including p-values, likelihoods, genetic map distance and linkage disequilibrium between SNPs, in filtering SNPs in several disease-associated regions. We use simu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2003